Picture for Xuebo Liu

Xuebo Liu

PACE: Defying the Scaling Hypothesis of Exploration in Iterative Alignment for Mathematical Reasoning

Add code
Feb 05, 2026
Viaarxiv icon

Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

Add code
Feb 05, 2026
Viaarxiv icon

Think Dense, Not Long: Dynamic Decoupled Conditional Advantage for Efficient Reasoning

Add code
Feb 02, 2026
Viaarxiv icon

CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning

Add code
Jan 30, 2026
Viaarxiv icon

Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation

Add code
Oct 01, 2025
Figure 1 for Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation
Figure 2 for Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation
Figure 3 for Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation
Figure 4 for Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation
Viaarxiv icon

AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs

Add code
Jul 24, 2025
Viaarxiv icon

REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models

Add code
May 26, 2025
Viaarxiv icon

Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments

Add code
May 23, 2025
Viaarxiv icon

Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning

Add code
May 22, 2025
Viaarxiv icon

AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration

Add code
Mar 24, 2025
Viaarxiv icon